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We present a method for the identification of continuous, spatiotemporal dynamics from ex- 
perimental data. We use a model in the form of a partial differential equation and formulate an 
optimization problem for its estimation from data. The solution is found as a multivariate nonlin- 
ear regression problem using the ACE-algorithm. The procedure is successfully applied to data, 
obtained by simulation of the Swift -Hohenberg equation. There are no restrictions on the dimen- 
sionality of the investigated system, allowing for the analysis of high-dimensional chaotic as well as 
transient dynamics. The demands on the experimental data are discussed as well as the sensitivity 
of the method towards noise. 



The unstable dynamics observed in spatially extended systems attracted huge experimental and theoretical research 
activity in the last decades (see and references therein) . Progress has been achieved by describing the dynamics 
in the vicinity of bifurcations with the help of universal amplitude equations, vastly reducing the complexity of the 
involved models. Additionally, the research has concentrated on the classification of the observed instabilities and 
the resulting patterns, and the investigation of scaling laws and intermittency effects Jj]]. For most investigations 
the models for spatiotemporal systems arise from mainly theoretical considerations and their validity is affirmed by 
the comparison with experimental measurements. Here we address the problem of finding a model, which describes 
the dynamics of an observed system, directly from experimental data. For systems which exhibit temporal low- 
dimensional chaotic motion this was accomplished with the help of nonlinear maps [^) . Other general methods rely 
on some sort of mode-expansion and were successfully applied Q . A different approach consists in a nonparametric 
model identification as proposed for systems with a time delay feedback [||]6) . 

In this letter we extend that approach to the identification of the underlying evolution equation of spatially extended 
systems. At first we formulate an optimization problem for finding a model equation from the data. Then, we rewrite 
the equations in the form of a multivariate nonlinear regression problem. As a last step, we use a novel kind of 
numerical algorithm for solving the problem. Our approach does not include any parameter dependencies, rather 
those are delivered as a by-product. We discuss the identification of homogeneous and autonomous partial differential 
equations, but emphasize that the ideas are quite general and can be applied to other problems like finite-difference 
equations, coupled-map lattices or integro-differential equations. 

We assume the dynamics of the system under consideration to be governed by a PDE of the form 

F[\7,d t ,u(x,t)]=0, (I) 

where u is the field variable with N components, T is an iV-dimensional function of u(x, t) and its spatial and 
temporal derivatives, with t and x the temporal and spatial variables, respectively. 

To ease the analysis and the representation, we only consider systems with a single component u and at most two 
spatial dimensions. We note that the following considerations are valid in principle also for multi-component systems 
and higher dimensions in space. 

In the following, we discuss the procedure to estimate a PDE of the form (Q) from experimental data. Therefore, we 
distinguish between the solution u{x, t) and the data v (x, t). For the sake of simplicity, we denote both the continuous 
space-time variables of the model field and the discrete space-time variables of the data by the same symbols x and 
t. Considering the data v as random variables, we act on a probability space and denote all entities estimated from 
the data by a hat - . 

Since one can get only an estimate for the true function T from the data v, all one can achieve is to estimate the 
analogue of Eq. (|I|) 

T[y,d u v{x,t)] =0, (2) 
where T is the estimate of T and the derivatives have been substituted by estimates computed from the data v. 
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To obtain T , we formulate from (|l|) the following optimization problem 0: 

inf||.F|| =e 2 . (3) 

The optimization lies in varying T until e 2 converges to the infimum. 

The function T is defined as a function of operators, e.g. V 2 , dt, id, . . .. We denote the set of constituting operators 
by {Oi}i=i,...,K- Note that in our definition of the differential operators Oi also any product terms like u 2 d xx ,d t d x 
are included. Since we consider a nonlinear problem, it is useful to split the function J- into sub-functions which 
have as arguments OiU, e.g. §(d xx u) = (d xx u) a or $(idu) = it' 3 . These §i are elements of some class of functions S, 
which has to be specified according to the problem. Thus, the final representation of Eq. ([!]) reads 

K 

T=J2MO l u) = 0. (4) 

i=0 

We want to determine the constituents OiU and the functions $i of T from a data set v. Therefore, we estimate 
at first OiU from the data by finite differencing or alternative schemes. Especially in the presence of noise, Fourier 
methods or kernel estimation could be helpful. The result consists of K random variables Oi = O^u. If the underlying 
PDE is linear, i.e. the function T a multilinear one, we could solve the problem with linear regression methods. 
Secondly, to obtain a solution for the nonlinear problem, we solve 

inf ||f>(0,)||=e 2 (5) 
i=i 

in varying the functions The result are the estimates $j. Up to now, we anticipated to know the number and 
type of operators Oj. For unknown systems, which are the ones we treat, it would be necessary to extend the number 
of operators K to infinity. In practice one has to select a finite number. Redundant terms then deliver <F; m as 
result. It is important to find a reasonable initial guess for the operators Oi of Eq. (Q). If there exists already some 
description for the system in a special state, one starts with the operators which appear in the known equations and 
tries to determine additional terms which may appear when leaving that state. This situation appears e.g. near some 
critical points, where one can start with already derived amplitude equations. 

In the case of data analysis, the optimization problem (|5|) becomes a multiple nonlinear regression problem which 
can be solved using the alternating conditional expectation algorithm (ACE) It has already been successfully 
applied in related fields of data analysis |||9| . In this case the norm of Eq. (H) is the Li norm. Using this algorithm, 
the functions of Eq. (Q) are seen as so-called optimal transformations j8|, which are defined to solve regression 
problems of the form 

K 

£[(*o(<%) - Y, = min ■ (6) 

i=i 

The $ j are varied in the space S of the Borel-measurable functions with the additional requirement of zero expectation, 
E[$i\ = (i = 0, . . . , K), and -E[$ 2 ] = 1- The convergence of the ACE algorithm in the case of discrete samples 
rather than random variables has also been proved in || . The minimization of the expectation value (|^) is equivalent 
to the calculation of the maximal correlation ^ fl0(| . Instead of e 2 we use the maximal correlation as a measure for 
the quality of the result: ^ e [0,1] equals 1 for perfect estimation, the smaller it is the worse is the estimation. 
The ACE-algorithm achieves the maximization by iteratively transforming each Oi by suitable, generally nonlinear, 
transformations such as to yield a linear relationship between the new random variables <$>i{Oi). 

As a first example, we analyze data v(x, t) from the numerical integration of the Swift-Hohenberg equation [pT[ . 
The model is of the form 



" 2 ^ 21 u - u 3 



d t u = [r - (V 2 + k 

J vv 



= (r - k 4 )u - u 3 - 2k 2 (d xx + d yy )u 



(&xxxx ~t~ &yyyy ~t~ ^^xxyy)^ • (7) 

The global dynamics of the model can be derived from a potential, such that the asymptotic time dependence is 
trivial [E[. Therefore, we analyze a transient state to have a sufficient variation in the time derivative. For the 
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identification procedure we use data produced by an explicit Euler integration scheme with a time step of 10 -4 , a 
spatial discretization of Ax = Ay = 0.25, and periodic boundary conditions. As initial conditions we choose uniformly 
distributed independent random numbers from the interval [—10, 10]. The parameters are r — 0.1 and k = 1. 

The differential operators are estimated by symmetric differencing schemes, e.g. dtv(x, t) = (v(x, t + At) — v(x, t — 
At))/2At. Thus, to estimate the time derivatives of first order in each spatial data point, we need three consecutive 
"pictures" of data. The field size is 100 x 100 points, i.e., the data set v(x, t) contains 3 * 10 4 values. The data for 
the central time point are shown in Fig. |[ In the following we can drop without ambiguity the hat ". . 

To identify the unknown system, we use an ansatz with non-mixed terms (like d x v) up to fourth order in the spatial 
derivatives. To show how to handle mixed terms we additionally include the terms vd x v, vd y v, d x vd y v. 

$o{d t v) = + ® 2 (d x v) + ® 3 (d v v) 

+<P 4 (d xx v) + $ 5 (dxyv) + $6(d yy v) 

+ $7(d XXX V) + . . . + <S> W (dyyyV) (8) 

xxxx 

+®w(vd x v) + <f> 17 {vd y v) 
+<f>i 8 (d x vd y v) 

with the nine redundant terms $ 2 {d x v), $ 3 (d y v), $ 5 (d xy v), ® 7 (d xxx v), $g(d xxy v), $g{d xyy v), $w(dyy V v), 
<&ii{d xxxy v), $i4(d xyyy v), $ie(vd x v), &n(vd y v), &is(d x vd y v). We choose this ansatz as a compromise between 
generality and computational effort. Comparing Eq. (Q) with Eq. (||), one expects in particular the following for 
the solution of Eq. (^): Up to an arbitrary common factor, $o should be the identity, $1 should be a polynomial 
of third order, and for i = 4,6, 11, 13, 15 the should be linear functions aiOi. All other estimates should vanish. 
Furthermore, we expect for the slopes of the linear functions, after division by the slope of the l.h.s to remove the 
arbitrary common factor, an — ag = a\ 3 = —2, an — a±s = — 1. 

Performing the ACE-algorithm, we find a maximal correlation of 0.9993 and optimal transformations as shown 
in Fig. |^. All functions approximate the expected shape well, and the terms which were expected to vanish are 
indeed very small compared to the other ones. Note that this does not mean that the values for the redundant terms 
are vanishing themselves but that these arc independent from all the other terms involved. The comparison of the 
slope of the linear functions yields the possibility to estimate parameters; we obtain a^ = a^ = —1.9, ot\3 = —2.0, 
an = ai^ = —1.0. The slopes of the terms which are expected to vanish are all smaller than 0.03 in absolute value. 
Since it is difficult to estimate the errors of these values, it is recommended to check the range of validity of the 
reconstructed model by integration and comparison with the dynamics. In Fig. |3| the nonlinearity — 0.9u — u 3 of 
Eq. (R) is compared explicitly with the estimate, taking now an ansatz with only the non-vanishing terms of the 
above result. Inside a range of 98% of the data values this term can be estimated with high accuracy. 

^,From a practical point of view, it is essential to discuss the stability of the identification method against noise. The 
effect of noise is to increase the errors of the estimates for the partial differential operators applied to the data, and 
therefore also the value of the error estimate e 2 . The identification detects the remaining correlations of the vector 
field and its derivatives via the function T . In general, if the vector field and its derivatives are still strongly correlated 
for a reasonable low noise level of a noisy system, the minimum according to Eq. (|B|) should still be detectable. 

To examine the stability against noise, we disturb each data point v(x,t) by additive Gaussian white noise with 
a standard deviation of 0.5% of the data standard deviation. The estimates of the partial derivatives are disturbed 
severely due to error propagation. In spite of this, using again Eq. (ph, the overall shape of all functions can still be 
recovered satisfactory, while being distorted (Fig. [|). The maximal correlation has decreased to 0.926. 

Two other systems which were analyzed with similar success as the example above, are: 1. The Kuramoto- 
Sivashinsky equation (in the form as given in Ref. in the fully chaotic regime. 2. A reaction-diffusion system with 
two components |{Uq , where the nonlinearity in the inhibitor dynamics is a non-analytic function. We reconstructed 
that function with high accuracy for different states like a single rotating spiral and spiral-defect chaos. 

Summarizing our experiences, we find the following requirements on the data: 1) In order to identify a system with 
N components it is sufficient to measure N independent variables. 2) The sampling of the vector fields in space as 
well as in time has to be appropriate to estimate the partial derivatives with respect to time and space properly. 3) In 
the preceding example we analyzed data at one time point but for an extended region in space. In other situations it 
may be useful to perform the analysis at a single spatial point but for a longer interval in time, or even a combination 
of both procedures. However, as the example shows, even a small amount of data can be sufficient. 4) Since the 
reconstructed functions can only be determined at points which are attained by the data and the estimates for the 
differential operators, the data have to show a sufficient variability in space and time. 5) The inference from the 
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data to a PDE is often not unique. But within the errors of the algorithm, our method reasonably reconstructs the 
dynamics on the trajectory given by the data. 6) We believe that due to the statistical nature of the method a higher 
noise level can typically be compensated by a larger data set. 

We would like to stress the fact that the procedure presented above is essentially independent of the part of phase 
space the system moves on and the boundary and initial conditions. Thus we do not require the dynamics to be close 
to or away from an attractor. This kind of analysis is thus applicable in the case of high-dimensional chaotic motion, as 
exhibited by nonlinear spatially extended systems, as well as transient motion. We did not discuss non-homogeneous 
or non-autonomous evolution laws, but in principle the arguments apply equally well, with some minor modifications, 
for those systems. 

In conclusion, we presented a method for the identification of spatiotemporal systems by numerical reconstruction 
of the PDE which describes the system. The identification procedure consists in solving an optimization problem 
which results in a nonlinear regression problem. Using the ACE-algorithm we showed that this task can indeed be 
solved in the case of numerical examples. We consider the method as a useful tool for the analysis of spatiotemporal 
systems and expect it to find a large variety of applications. The limits of this method and the requirements on the 
data have been discussed. Future work will concentrate on an extension of the ACE-algorithm to the solution of 
multi -component problems, and on the application to real data. Extensions to more general systems are envisaged. 

We acknowledge helpful discussions with H. Kantz, J. Kurths, J. Parisi, A. Pikovsky, M. Zaks, and financial support 
by the Max-Planck-Gesellschaft. 
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FIG. 1. The data sample v(x,y,to) for the central time point to encoded in grey values (small values dark). The horizontal 
and vertical axes are x and y, respectively. 
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FIG. 2. Four exemplary estimates for the optimal transformations $o, 3>4, $5, "l?i6 are shown. The linear occurrence of 
d t u and d xx u is clearly recovered by $0 and $4, respectively. The redundant terms d xy u and ud x u vanish approximately. The 
dotted lines mark the interval on the abscissa in which 98% of the data points are located. Due to a very non-homogeneous 
distribution of the data the optimal transformations outside the marked interval cannot be estimated reliably. The results for 
the optimal transformations not shown here deliver estimates of equal quality. 
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